Causal models have no complete axiomatic characterization 

Sanjiang Li 
Department of Computer Science and Technology, 
Tsinghua University, Beijing 100084, China 
O ' lisanjiang@tsinghua.edu.cn 

O 
(N 

b^ , Abstract 

Markov networks and Bayesian networks are effective graphic representations of the depen- 
dencies embedded in probabilistic models. It is well known that independencies captured by 

l/") ■ Markov networks (called graph-isomorphs) have a finite axiomatic characterization. This pa- 

per, however, shows that independencies captured by Bayesian networks (called causal models) 
have no axiomatization by using even countably many Horn or disjunctive clauses. This is be- 
cause a sub-independency model of a causal model may be not causal, while graph-isomorphs 

.^T are closed under sub-models. 
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1 Introduction 

The notion of conditional independence (CI) plays a fundamental role in probabilistic reasoning. In 
traditional theories of probability, to decide if a CI statement holds, we need to check whether two 
conditional probabilities are equal, which require summations over exponentially large number of 
variable combinations. This numerical approach is clearly impractical. An alternative qualitative 
approach is very popular in artificial intelligence, where new CI statements can be derived logically 
without reference to numerical quantities. Given an initial set of independence relations, a fixed 
(finite) set of axioms can be used to infer new independencies by logical manipulations. 

A natural question arises: can CI relations be completely characterized by a finite set of axioms 
(or called inference rules)? Pearl and Paz [5] introduced the concept of semi-graphoid as an 
independency model that satisfies four specific axioms, and showed that each CI relation is a semi- 
graphoid. Later, Studeny [B] gave a negative answer to this question. But, more positively, he also 
showed that (i) CI relations have a characterization by a countable set of axioms [6] ; and (ii) every 
probabilistically sound axiom with at most two antecedents is a consequence of the semi-graphoid 
axioms [7]. 

Although CI relations in general have no complete axiomatic characterization, Geiger and Pearl 
[2] developed complete axiomatizations for saturated independence and marginal independence - 
two special families of CI relations. 

Graphs are the most common metaphors for communicating and reasoning about dependen- 
cies. It is not surprising that graphical models is a very popular way of specifying independence 
constraints. There are in general two kinds of graphical models: Markov networks and Bayesian 
networks. A Markov network is an undirected graph, while a Bayesian network is a directed acyclic 
graph (DAG). Geiger and Pearl [5] developed an axiomatic basis for the relationships between CI 
and graphic models in statistic analysis. They showed in particular that (i) every axiom for con- 
ditional independence is also an axiom for graph separation; and (ii) every graph represents a 
consistent set of independence and dependence constraints. Moreover, an early work of Pearl and 
Paz [5] gave an axiomatic characterization for CI relations captured by undirected graphs (called 
graph-isomorphs) . It was also conjectured [3] that CI relations captured by DAGs (called causal 
models) may have no finite axiomatic characterization. 

In this paper, we confirm this conjecture and show that causal models have no complete char- 
acterization by any (finite or countable) set of (Horn or disjunctive) axioms. We achieve this by 
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showing that a sub-model of a causal model can be not causal. This is contrasted by CI relations 
and graph-isomorphs. Both are closed under sub-models. 

It came to us very late that the same observation has been made in [SJ Remark 3.5], where 
Studeny gave just basic argument. This paper will provide a complete proof for this observation. 

The remainder part of this paper proceeds as follows. Section 2 provides preliminary defini- 
tions for independency models, CI relations, graph-isomorphs, and causal models. Section 3 gives 
syntactic and semantic descriptions of independency logic, and then formalizes the notion of ax- 
iomatization. Then in Section 4 we discuss heredity property of independency models. Further 
discussions are given in the last section. 

2 Preliminaries 

In this section we introduce the basic notions used in this paper. Our reference is [31 Sj. In what 

follows, if not otherwise stated we assume U is a finite set, and write p(U) for the powerset of U. 

The notion of conditional independency (CI) plays a fundamental role in probabilistic reasoning. 

Definition 2.1 (conditional independency, CI). Let U be a finite set of variables with discrete 
values. Let P(-) be a joint probability function over the variables in U. For three disjoint subsets 
X, Y, Z of U, X and Y are said to be conditional independent given Z if for all values x, y and z 
such that P(y, z) > we have P(x\y, z) — P(x\z). 

We use the notation I(X, Z, Y)p to denote the conditional independency of X and Y given Z. 
The set of all these CI statements form a ternary relation on p(U), called a CI relation. In general, 
we have 

Definition 2.2 (independency model [3j). An independency model M defined on U is a ternary 
relation on p(U) which satisfies the following condition: 

(A, C, B) e M =$> A,B,C are pairwise disjoint. (1) 

A tuple (A,C,B) in M (out of M, resp.) is called an independence statement (a dependence 
statement, resp.). We write I(A,C,B)m to indicate the fact that (A,C,B) is in M. 

Two other classes of independency models arise from graphs, where the notion of separation 
plays a key role. 

Definition 2.3 (graph separation |3J). If A, B and C are three disjoint subsets of nodes in an 
undirected graph G, then C is said to separate A from B, denoted (A\C\B)g, if along every path 
between a node in A and a node in B there is a node in C . 

The independency model consisting of all graph separation instances in G is a graph-isomorph. 

Definition 2.4 (graph-isomorph [3j). An independency model M is said to be a graph-isomorph 
if there exists an undirected graph G = (U, E) such that for every three disjoint subsets A, B, C 
of U, we have 

I(A,C,B) M &(A\C\B) G . (2) 

For directed acyclic graphs, a similar separation property was defined. 

Definition 2.5 (d-scparation [3]). If A, B and C are three disjoint subsets of nodes in a DAG D, 
then C is said to d-separate A from B, denoted (A\C\B)d, if along every path between a node in 
A and a node in B there is a node w satisfying one of the following two conditions: 

• w has converging arrows and none of w or its descendants are in C; or 

• w does not have converging arrows and w is in C. 

The independency model consisting of all <i-separation instances in a DAG D is a causal model. 



Definition 2.6 (causal model [3]). An independency model M is said to be causal if there is a 
DAG D such that for every three disjoint subsets A, B, C of U, we have 

I(A,C,B) M ^(A\C\B) D . (3) 

It was proved by Geiger and Pearl that, for every graph-isomorph (causal model) M on U, 
there is a probability distribution P on U such that M is precisely the CI relation induced by P 

ID 12]. 

3 Independency logic 

To formalize the notion of axiomatization, we introduce the independency logic IC. Although IC 
is a fragment of first-order logic, we are mainly concerned with its propositional counterpart. 
The language of IC has as its alphabet of symbols: 

• variables Xi , X2 , • • • ; 

• the constant 0; 

• the ternary predicate I; 

• three function letters: — , U, D; 

• the punctuation symbols (,) and ,; 

• the connectives -1, V, A 

Terms in the independency logic are defined as follows. 

Definition 3.1 (term). A term in IC is defined as follows. 

(i) Constant and Variables are terms. 

(ii) If Ti, T2 are terms in IC, then — Tj., Ti U T2, Ti (~l T2 are terms in IC. 

(iii) The set of all terms is generated as in (i) and (ii) . 

Using the unique predicate /, we can form atomic formulas. 

Definition 3.2 (atom, literal, clause). An atom in IC is defined by: if Tj (i = 1,2,3) are terms 
in IC, then /(Ti,T2,T3) is an atom. A literal is defined to be an atom (called positive literal) or 
its negation (called negative literal). A clause is the disjunction of a finite set of literals. 

Formulas in IC are defined in the standard way. 

Definition 3.3 (formula). A formula in IC is an expression involving atoms and connectives 
-1, A, V, which can be formed using the rules: 

(i) Any atom is a formula. 
(ii) If A and B are formulas, then so are (-v4), (A A B), and (A V B). 

In the rest of this paper, we shall sometimes omit parentheses, as long as no ambiguity is 
introduced. 

Since implication statement are convenient for expressing inference rules, we define (A — > B) 
as an abbreviation of ((-'A) V B). As a consequence, each clause 

k I 

\f --/(Tm.Tjh.Tm) V \J J(Ti, fc+J -,T2,*+,-,T 3 ,*+ J -) (4) 

i=\ j=i 

can be equivalently represented as an implication (or rule) 

k 1 

f\ I(Tli,T2i,T3i) — > y I(7i t k+j,T2,k+j,^3,k+j)- (5) 

Clauses are of particular importance in axiomatization of independency models. 



Definition 3.4 (Horn and disjunctive clauses). For a clause C of form Eq. O C is called a Horn 
clause if I < 1, and called disjunctive otherwise. 

Above we introduced the syntactic part of IC. Next we turn to semantic notions. 

Definition 3.5 (valuation). Let M be an independency model defined on U . A valuation in M is 
a function v : {Xi,X2, • • • } — > 2 U . 

Valuations can be extended in a natural way to terms in IC. 

Definition 3.6 (valid valuation). Let A be a formula, and let M be an independency model 
defined on U. A valuation v in M is valid for A if for each atom J(Ti,T2,T3) appeared in A, 
v(Ti), i>(T 2 ), and viJTs) are pairwise disjoint, where y(T) is the valuation of T in M. 

The notion of satisfaction is defined in the standard way. Note that if v is valid for A in M, then 
it is also valid for any sub-formula B of A in M. The following definition is therefore well-defined. 

Definition 3.7 (satisfaction). Let A be a formula, and let M be an independency model defined 
on (7. A valuation v in M is said to satisfy A if v is valid for A and it can be shown inductively 
to do so under the following conditions. 

• v satisfies atom i"(Ti,T 2 ,T 3 ) if (v(Ti),u(T 2 ),v(T 3 )) el. 

• v satisfies ~^B if v does not satisfies B. 

• v satisfies B V C if cither t; satisfies Boru satisfies C. 

• v satisfies B A C if i> satisfies both £> and C. 

We say M satisfies A, in notation M |= .A, if all valid valuations of A in M satisfy A. 

The following proposition is a consequence of the definition of — ►. 

Proposition 3.1. Let A,B be two formulas, and let M be an independency model defined on U. 
Then M |= A —> B iff for any valid valuation v of A — > B in M , v satisfies A implies v satisfies 
B. 

For a clause we have the following characterization. 

Corollary 3.1. Let C be a clause of form Eq. [31 and let M be an independency model defined on 
U . Then M |= C iff the following condition holds: 

• for any valid valuation v of C in M, if (v(Tn),v(T2i),v(T3i)) 6 M for all 1 < i < k, then 
[v{Tij),v{Tzj),v{Tsjj) 6 M for some k + l<j<k + l. 

Given a family of independency models M and a (finite or countable) set of formulas F in 2X, 
we now formalize the notion that B can be axiomatically characterized by F. 

Definition 3.8 (axiomatization). A family of independency models M can be completely charac- 
terized by a set of formulas F in IC if the following condition holds for any independency model 
M: 

MeM«(V6e ¥)M \= B. (6) 

We say M has a finite (countable, resp.) axiomatization if it can be completely characterized by a 
finite (countable, resp.) set of formulas 'mTC. 

Since each formula in TC is semantically equivalent to the conjunction of a set of finite clauses, 
we need only consider clauses. 

Proposition 3.2. A family of independency models M has a finite (countable, resp.) axiomatiza- 
tion iff it can be completely characterized by a finite (countable, resp.) set of clauses inIC 

Analogous to prepositional calculus, we have the completeness theorem for IC. 



Theorem 3.1. Suppose M is axiomatically characterized by F. Let £ be a set of formulas, A be 
a formula. Then the following two conditions are equivalent. 

(1) £ |=m A: for any model M in M, if M satisfies all formulas in £, it also satisfies A; 

(2) £ hp A: A is deducible fron\\ £ by using axioms in F. 

In particular, we have 

Corollary 3.2. Let M and F be as in the above theorem. For a set T of independence statements 
{I(Tu, Ti2, T^) : 1 < i < k} and an independence statement 7 = L(Tk+i t i, Ife+1,2, 7fc+i,3), we have 
r |=m J iff J is deducible from T by using axioms in F. 

4 Sub-models 

In this section, we consider sub- independency models. 

Definition 4.1 (sub- model). Let M be an independency model defined on U, and let V be a 

subset of U. We call M\ v = {(A,C,B) e M : A,B,C C V} the sub-independency model (or 
simply sub-model) of M on V. 

The following result asserts that if an independency model satisfies a formula, so does its 
sub-model. 

Proposition 4.1. Let A be a formula, and let M be an independency model defined on U . For 
any subset V ofU, if M satisfies A, then so does Af|y. 

Proof. This is because any valuation v in M \y is also a valuation in M. □ 

An interesting question arises naturally. Given an independency model M on U, suppose M 
is a CI relation (or graph- isomorph, or causal model), and V C U. Is sub-model M\y also a CI 
relation (or graph- isomorph, or causal model)? This is important for a family of independency 
models M to be axiomatizable. Actually, if M is not closed under sub-models, then it cannot be 
axiomatically characterized by any set of formulas. 

Given a joint probability P(-), write M for the CI relation on U induced by P(-), i.e. for any 
pairwise disjoint subsets A, B,C oil! the tuple (A, C, B) is an instance of M if and only if A and B 
are conditionally independent given C (see Def. 12. H and Def. 12. 2 [I . We claim that, for a nonempty 
subset V of [/, M|y, the restriction of M on V ', is a CI relation on V. This is because M\y is 
induced by the joint probability P\y{-), which is obtained from P(-) by computing the marginal 
probability of P on V. 

A similar conclusion holds for graph-isomorphs. 

Lemma 4.1. Let G = (U, E) be an undirected graph on U , and let V be a nonempty proper subset 
of U . Define an undirected graph G' = (V, E 1 ) as follows: for any two nodes a, f3 S V, (a, [3) G E' 
iff there is a path p from a to [3 in G such that all other nodes in p are contained in U — V . Then 

(a\C\(3) G & (a\C\(3) G , (7) 

for any a, [3 £ V , and any C C V . 

Proof. Suppose (a\C\[3)c- We show C separates a from (3 in G. For each path 

p = 07172 • • • 7m/3 (m > 0) 

in G, we show m > 1 and some 7$ is contained in C. Since (a\C\(3) g 1 , {ct, (3) is not an edge in G' . 
By definition, we know (i) (a, j3) is not an edge in G, hence m > 1; and (ii) some node 7* must be 
contained in V. Suppose 7i 17 7i 2 , • ■ ■ ,"fi k (1 < i\ < ii < ■ ■ ■ < ik < m ) are all those nodes in V. 
Since nodes between 7^ and 7i u+1 (and those between a and 7^ , and between "fi k and /?) must be 



1 In the sense of logic deduction. 




Figure 1: A DAG D on U = {0, 1, 2, 3, 4}. 

contained in U — V. By definition of G', we know (a, 7^), (7i u ,7i„ + i), (7« fc , 0) are all edges in G' . 
Therefore 

p' = aj tl 7 42 •••7i fcy 9 

is a path in G". By {a\C\P)c , we know some 7i u must be in G. This shows that G separates a 
from (3 for every path p in G. 

On the other hand, suppose (o:|C|/3)g. We show G separates a from (3 in G'. For each path 

p = «7i7 2 • • • 7 m /3 (m > 0) 

in G', we show some 7, is contained in G. 

Write 70 and 7 m +i for a and /3. Note that if (7^,7^+1) is not an edge in G, then by definition 
there is a path pi in G from 74 to 7^+1 such that all other nodes in p^ are contained in G. Concate- 
nating paths Po,Pi, ■ ■ ■ ,p m we obtain a new 'path' in G from a to /3 which satisfies the following 
condition: 

Each node is either in p or in U — V. 

In this 'path' identical nodes may occur several times. With proper modifications, we obtain a 
shortened path p' in G which also satisfies condition j4|). By our assumption that {a\C\(3)a, we 
know some node in p' must be contained in G. But by G C V, this shows some 7^ must be in G. 
Hence G separates a from /3 for every path p in G'. □ 

Proposition 4.2. Let M be a graph- is omorph on U. For a nonempty subset V ofU, M\v is also 
a graph-isomorph. 

Proof. Suppose M is represented by an undirected graph G = (U,E). We show M\y can be 
represented by the undirected graph G' — (V 7 E r ) constructed in Lemma |4~H i.e. for for any 
pairwise disjoint subsets A, B, C of V, we have I(A, G, B) M \ V iff (A\C\B)c' ■ By definition of graph 
separation, for a graph G* we know (A\C\B)g* iff (Vq G A){\/(3 e B){<x\C\(3)g* ■ By Lemma l4Tl for 
any a, (3 e V and any G C V we have (a|G|^) G <^ {a\C\(3) G ,. Therefore (A\C\B) G , iff (A|G|B) G 
for any pairwise disjoint subsets A, B, C of V. Since M is representable by G, it is clear that M\v 
is also representable by G'. D 

But the following example shows that this is not true for causal models. 

Example 4.1. Let M be the causal model representable by the DAG D given in Fig. [TJ and let 
V = {1,2,3,4}. The sub-independency model M\y is not representable by any DAG. 

To prove this conclusion, we use the notation D(a, j3) to express the fact that in M\y there is 
no G C V such that I(a, G, (3) is true. It is clear that the following independency statements holds 
in M\y. 

• I>(l,2),D(3,4),I>(2,3)i 

. 7(1,0,3), 7(1,4,3), 7(2,0,4), 7(2,1,4). 



In a DAG D = (U, E), for any two nodes a, (3 6 U, it is well known that (a, j3) € E or (/?, a) 6 E 1 
iff no C C [/ can d-separates a from (3. 

Suppose M\v is representable by some DAG D' defined on V. By D(l, 2), D(3, 4), and D(2, 3) 
we know in £)' node I is connected to node 2, node 2 is connected to node 3, and node 3 is connected 
to node 4. This shows that p = 1234 is a path from node I to node 4. But by 1(1, 0, 3)m\ v and 
p' = 123 is a path from node 1 to node 3, we know in D' we should have I — ► 2 <— 3. Similarly, for 
nodes 2 and 4, we should also have 2 — > 3 «— 4 in D'. This is impossible since 2^3 and 2-^3 
cannot appear together in the same DAG. 

This proves that M\v has no DAG representation, hence is not causal. 

As a corollary of this example and Prop. [4~Tl we have 

Theorem 4.1. Causal models have no complete axiomatic characterization. 

Proof. Let F = {Ai,A2, • • • } be the set of clauses that are satisfied by all causal models. In 
particular, the causal model M given in Example 14. 1 1 satisfies each Ai- By Prop. l4TTl we know M\y 
also satisfies each A%. Since M\v is not causal, the infinite set T (let alone finite subsets of T) 
cannot provide a complete characterization for causal models. □ 

5 Discussion 

We have shown that it is impossible to give a complete axiomatic characterization for causal 
models. This is different from the results obtained in [5] and [5J. In [5], Pearl and Paz proved that 
graph- isomorphs have a complete characterization by five axioms (4 Horn, 1 disjunctive). Since 
a sub-model of a graph-isomorph also satisfies these axioms, it is clear that sub-models of graph- 
isomorphs are graph- isomorphs. We gave a method for constructing such a graph representation. 

Studeny [6j showed that there is no finite axiomatization for CI relations by using Horn clauses. 
More positively, he also showed that there exist an infinite set of Horn clauses that completely 
characterize CI relations. But it is still unknown whether CI relations have finite axiomatization 
by using arbitrary clauses (Horn or disjunctive). 

The class of sub-models of causal models seems useful when (unknown) hidden variables are 
involved. As for axiomatization, a result by Geiger (see [31 Exercises 3.7]) suggests that it may 
have no finite characterization by Horn axioms. 
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